Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

نویسندگان

Jeffrey Johnson

Scott J. Krieder

Benjamin Grimmer

Justin M. Wozniak

Michael Wilde

Ioan Raicu

چکیده

Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivated us to explore the support of MTC on the new Intel Xeon Phi accelerators. The Xeon Phi is a PCI-Express based expansion card comprised of 60 cores supporting 240 hardware threads to produce up to 1 teraflop of doubleprecision performance in a single accelerator. These cards are already being integrated into super-computing clusters such as Stampede, which hosts over 6,400 Xeon Phi Accelerators totaling in over 7 petaflops of doubleprecision performance. This work provides an in depth understanding of MTC on the Intel Xeon Phi and presents our preliminary results of running several different workloads on pre-production Intel Xeon Phi hardware. By utilizing Intel’s provided SCIF protocol for communicating across the PCI-Express bus we have achieved over 90% efficiency near or outperforming OpenMP offloading tasks over 300 uS with our batch framework. This performance opens the opportunity for the development of a framework for executing heterogeneous tasks on the Xeon Phi alongside other potential accelerators including graphics cards for MTC applications. Our framework will provide fine granularity for executing MTC applications across large scale compute clusters. It will be integrated with our existing graphics card framework, GeMTC, to provide transparent access to GPUs, Xeon Phis, and future generations of accelerators to help bridge the gap into Exascale computing Keywords-MIMD, MTC, Accelerator, Intel Xeon Phi, Coprocessor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unied Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment

Many of the heterogeneous resources available to modern computers are designed for dierent workloads. In order to eciently use GPU resources, the workload must have a greater degree of parallelism than a workload designed for multicoreCPUs. And conceptually, the Intel Xeon Phi coprocessors are capable of handling workloads somewhere in between the two. is multitude of applicable workloads wi...

متن کامل

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and errorprone. Trying to overcome these difficulties, Intel developed their own Many Int...

متن کامل

Exploring SIMD for Molecular Dynamics, Using Intel

We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia’s miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256and 512bit). The applicability of these optimisations to wider SIMD is discu...

متن کامل

Matrix factorization routines on heterogeneous architectures

In this work we consider a method for parallelizing matrix factorization algorithms on systems with Intel © Xeon Phi TM coprocessors. We provide performance results of matrix factorization routines implementing this approach and available in Intel © Math Kernel Library (Intel MKL) on the Intel © Xeon © processor line with Intel Xeon Phi coprocessors. Summary New heterogeneous systems consisting...

متن کامل

Task-Based Cholesky Decomposition on Knights Corner Using OpenMP

The growing popularity of the Intel Xeon Phi coprocessors and the continued development of this new many-core architecture have created the need for an open-source, scalable, and cross-platform taskbased dense linear algebra package that can efficiently use this type of hardware. In this paper, we examined the design modifications necessary when porting PLASMA, a task-based dense linear algebra...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

نویسندگان

چکیده

منابع مشابه

Unied Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment

First experiences with the Intel MIC architecture at LRZ

Exploring SIMD for Molecular Dynamics, Using Intel

Matrix factorization routines on heterogeneous architectures

Task-Based Cholesky Decomposition on Knights Corner Using OpenMP

عنوان ژورنال:

اشتراک گذاری